Efficient comparative phylogenetics on large trees

نویسندگان

  • Stilianos Louca
  • Michael Doebeli
چکیده

Motivation Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable. Results Here we present a new R package, named 'castor', for comparative phylogenetics on large trees comprising millions of tips. On large trees castor is often 100-1000 times faster than existing tools. Availability and implementation The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN). Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods and Architectures for Realizing Fast Phylogenetic Computation Engines Using VLSI Array Based Logic

Evaluating phylogenetics trees is an endeavor fundamental to comparative genomics and a core discipline of Bioinformatics. However, with single trees taking up to a week on the fastest processor under general models of evolution and the number of trees growing exponentially with the number of sequences analyzed, this is an exceptionally computationally intensive endeavor. There has been much wo...

متن کامل

Graphical Methods for Visualizing Comparative Data on Phylogenies

Phylogenies have emerged as central in evolutionary biology over the past three decades or more, and an extraordinary expansion in the breadth and sophistication of phylogenetic comparative methods has played a large role in this growth. In this chapter, I focus on a somewhat neglected area: the use of graphical methods to simultaneously represent comparative data and trees. As this research ar...

متن کامل

RESEARCH ARTICLES Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets

The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computatio...

متن کامل

Comparative Cultural Phylogenetics and the Transmission of Belief in an Oral Society

Cultural transmission typically results in a network of connections between cultural units (such as individuals or social groups), not the branching patterns of descent seen in genetic inheritance. As a consequence, the application of phylogenetic (or evolutionary clustering) methods to cultural history faces methodological problems. Most importantly, the history of relationships inferred throu...

متن کامل

A New Distance-based Approach for Phylogenetic Analysis of Protein Sequences

With the availability of ever-increasing gene and protein sequence data across a large number of species, reconstruction of phylogenetic trees to reveal the evolutionary relationship among those species becomes more and more important. In this paper, we take the physicochemical properties of amino acids into account and introduce the protein feature sequences into phylogenetic analysis by using...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 34 6  شماره 

صفحات  -

تاریخ انتشار 2018